Skip to content

Conversation

@deepfates
Copy link
Owner

What

  • Move monolithic CLI into modular structure:
    • src/core/types.ts: shared types, arg parsing, logger, utils
    • src/sources/twitter.ts: ingest + detect for Twitter/X archives
    • src/transforms/core.ts: cleanText, filters, grouping, conversation mapping
    • src/outputs/writers.ts: markdown, oai, jsonl, sharegpt, stats
    • src/cli/splice.ts: CLI wiring with existing flags/behavior
  • Add tsconfig.json and update package.json (bin now dist/cli/splice.js)
  • Keep top-level splice.ts as a forwarder for dev compatibility
  • Add public library API (src/index.ts) and export in package.json
  • Update README with architecture and library usage

Why

  • Establish clear sources → transforms → outputs architecture
  • Make it easy to add new inputs/outputs and later checkpoints
  • Enable external consumers to import modules and plug in proprietary adapters

Compatibility

  • CLI flags and outputs unchanged
  • Node 18+, ESM

Tests

  • Build succeeds
  • Integration tests passing:
    • tests/integration/basic.test.ts
    • tests/integration/media.test.ts

@deepfates deepfates requested a review from Copilot October 18, 2025 23:48
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the monolithic CLI into a modular pipeline architecture with clear separation of concerns: sources → transforms → outputs → core → cli. The refactor establishes a foundation for adding new input sources and output formats while maintaining full backward compatibility with existing CLI flags and behavior.

Key changes:

  • Extracted core types, utilities, and CLI arg parsing into src/core/types.ts
  • Created modular source adapter (src/sources/twitter.ts) for Twitter/X archive ingestion
  • Separated transform logic (src/transforms/core.ts) for filtering, grouping, and conversation mapping
  • Isolated output writers (src/outputs/writers.ts) for Markdown, OAI, JSONL, ShareGPT, and stats
  • Introduced public library API (src/index.ts) with pluggable adapter interfaces

Reviewed Changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/transforms/core.ts Text cleaning, filtering, thread grouping, and conversation-to-message conversion logic
src/sources/twitter.ts Twitter/X archive detection and ingestion with media handling
src/outputs/writers.ts Writers for Markdown, OAI JSONL, normalized JSONL, ShareGPT, and stats formats
src/index.ts Public library API exposing types, adapters, and extension interfaces
src/core/types.ts Shared types, CLI argument parsing, logger, and utility functions
src/cli/splice.ts CLI entrypoint orchestrating the modular pipeline
package.json Updated main entry point to dist/index.js for library usage
README.md Enhanced documentation with architecture overview and library usage examples

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

}

for (const it of items) {
const role: Role = it.raw && "full_text" in (it.raw as any) ? "assistant" : "user";
Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The role assignment logic uses an implicit type check with as any and a property existence check. This is fragile and unclear. Consider extracting this to a named function like inferRole(item: ContentItem): Role with explicit documentation of the heuristic being used.

Copilot uses AI. Check for mistakes.
package.json Outdated
".": "./dist/index.js"
},
"bin": {
"splice": "dist/splice.js"
Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bin entry points to 'dist/splice.js' but according to the PR description and README changes, the CLI should be at 'dist/cli/splice.js'. This will cause the published package's CLI command to fail.

Suggested change
"splice": "dist/splice.js"
"splice": "dist/cli/splice.js"

Copilot uses AI. Check for mistakes.
const validFormats = requested.filter((f) => allowedFormats.has(f));
const invalidFormats = requested.filter((f) => !allowedFormats.has(f));
for (const bad of invalidFormats) {
logger("warn", `Unknown format "${bad}". Supported: markdown, oai, json`);
Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message lists 'markdown, oai, json' as supported formats but omits 'sharegpt' which is in the allowedFormats set (line 191). The message should include all supported formats.

Copilot uses AI. Check for mistakes.
if (formatSpecified && validFormats.length === 0) {
logger(
"error",
"No valid formats requested. Supported: markdown, oai, json",
Copy link

Copilot AI Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message also omits 'sharegpt' from the list of supported formats. Should be consistent with the allowedFormats set.

Copilot uses AI. Check for mistakes.
@deepfates deepfates merged commit 7fab0ac into main Oct 18, 2025
9 checks passed
@deepfates deepfates deleted the refactor/pipeline-modular branch October 18, 2025 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant